Annotation for and Robust Parsing of Discourse Structure on Unrestricted Texts

نویسندگان

  • Jason Baldridge
  • Nicholas Asher
  • Julie Hunter
چکیده

Predicting discourse structure on naturally occurring texts and dialogs is challenging and computationally intensive. Attempts to construct hand-built systems have run into problems both in how to specify the required knowledge and how to perform the necessary computations in an efficient manner. Data-driven approaches have recently shown to be successful for handling challenging aspects of discourse without using lots of fine-grained semantic detail, but they require annotated material for training. We describe our effort to annotate Segmented Discourse Representation Structures on Wall Street Journal texts, arguing that graph-based representations are necessary for adequately capturing the dependencies found in the data. We then explore two data-driven parsing strategies for recovering discourse structures. We show that the generative PCFG model of B&L is inherently limited by its inability to incorporate new features when learning from small data sets, and we show how recent developments in dependency parsing and discriminative learning can be utilized to get around this problem and thereby improve parsing accuracy. Results from exploratory experiments on Verbmobil dialogs and our annotated news wire texts are given; these results suggest that these methods do indeed enhance performance and have the potential for significant further improvements by developing richer feature sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discourse Segmentation of German Texts

This paper addresses the problem of segmenting German texts into minimal discourse units, as they are needed, for example, in RST-based discourse parsing. We discuss relevant variants of the problem, introduce the design of our annotation guidelines, and provide the results of an extensive interannotator agreement study of the corpus. Afterwards, we report on our experiments with three automati...

متن کامل

A Block-Based Robust Dependency Parser for Unrestricted Chinese Text1

Although substantial efforts have been made to parse Chinese, very few have been practically used due to incapability of handling unrestricted texts. This paper realizes a practical system for Chinese parsing by using a hybrid model of phrase structure partial parsing and dependency parsing. This system showed good performance and high robustness in parsing unrestricted texts and has been appli...

متن کامل

A Block-Based Robust Dependency Parser For Unrestricted Chinese Text

Although substantial efforts have been made to parse Chinese, very few have been practically used due to incapability of handling unrestricted texts. This paper realizes a practical system for Chinese parsing by using a hybrid model of phrase structure partial parsing and dependency parsing. This system showed good performance and high robustness in parsing unrestricted texts and has been appli...

متن کامل

The Rhetorical Parsing of Natural Language Texts

We derive the rhetorical structures of texts by means of two new, surface-form-based algorithms: one that identifies discourse usages of cue phrases and breaks sentences into clauses, and one that produces valid rhetorical structure trees for unrestricted natural language texts. The algorithms use information that was derived from a corpus analysis of cue phrases.

متن کامل

Cunha towards discourse parsing in Spanish

texts can be analysed from different perspectives. one of the most difficult phenomena to process is discourse structure (hovy 2010). in recent years, one of the main challenges in the field of natural language processing (nlp) has been discourse parsing. research on this topic has been done for several languages, such as Japanese (Sumita et al. 1992), english (marcu 2000) and portuguese (pardo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007